Learning to Optimize via Posterior Sampling
نویسندگان
چکیده
منابع مشابه
Learning to Optimize via Posterior Sampling
Full terms and conditions of use: http://pubsonline.informs.org/page/terms-and-conditions This article may be used only for the purposes of research, teaching, and/or private study. Commercial use or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher approval. For more information, contact [email protected]. The Publisher does not warr...
متن کاملLearning to Optimize via Information-Directed Sampling
This paper proposes information directed sampling–a new algorithm for balancing between exploration and exploitation in online optimization problems in which a decision-maker must learn from partial feedback. The algorithm quantifies the amount learned by selecting an action through an information theoretic measure: the mutual information between the true optimal action and the algorithm’s next...
متن کامل(More) Efficient Reinforcement Learning via Posterior Sampling
Most provably-efficient reinforcement learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration: posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Mark...
متن کاملDistributed Bayesian Posterior Sampling via Moment Sharing
We propose a distributed Markov chain Monte Carlo (MCMC) inference algorithm for large scale Bayesian posterior simulation. We assume that the dataset is partitioned and stored across nodes of a cluster. Our procedure involves an independent MCMC posterior sampler at each node based on its local partition of the data. Moment statistics of the local posteriors are collected from each sampler and...
متن کاملPosterior Sampling for Large Scale Reinforcement Learning
Posterior sampling for reinforcement learning (PSRL) is a popular algorithm for learning to control an unknown Markov decision process (MDP). PSRL maintains a distribution over MDP parameters and in an episodic fashion samples MDP parameters, computes the optimal policy for them and executes it. A special case of PSRL is where at the end of each episode the MDP resets to the initial state distr...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2014
ISSN: 0364-765X,1526-5471
DOI: 10.1287/moor.2014.0650